ARBitrator: a software pipeline for on-demand retrieval of auto-curated nifH sequences from GenBank

نویسندگان

  • Philip Heller
  • H. James Tripp
  • Kendra Turk-Kubo
  • Jonathan P. Zehr
چکیده

MOTIVATION Studies of the biochemical functions and activities of uncultivated microorganisms in the environment require analysis of DNA sequences for phylogenetic characterization and for the development of sequence-based assays for the detection of microorganisms. The numbers of sequences for genes that are indicators of environmentally important functions such as nitrogen (N2) fixation have been rapidly growing over the past few decades. Obtaining these sequences from the National Center for Biotechnology Information's GenBank database is problematic because of annotation errors, nomenclature variation and paralogues; moreover, GenBank's structure and tools are not conducive to searching solely by function. For some genes, such as the nifH gene commonly used to assess community potential for N2 fixation, manual collection and curation are becoming intractable because of the large number of sequences in GenBank and the large number of highly similar paralogues. If analysis is to keep pace with sequence discovery, an automated retrieval and curation system is necessary. RESULTS ARBitrator uses a two-step process composed of a broad collection of potential homologues followed by screening with a best hit strategy to conserved domains. 34 420 nifH sequences were identified in GenBank as of November 20, 2012. The false-positive rate is ∼0.033%. ARBitrator rapidly updates a public nifH sequence database, and we show that it can be adapted for other genes. AVAILABILITY AND IMPLEMENTATION Java source and executable code are freely available to non-commercial users at http://pmc.ucsc.edu/∼wwwzehr/research/database/. CONTACT [email protected] SUPPLEMENTARY INFORMATION SUPPLEMENTARY INFORMATION is available at Bioinformatics online.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Transterm: a database to aid the analysis of regulatory sequences in mRNAs

Messenger RNAs, in addition to coding for proteins, may contain regulatory elements that affect how the protein is translated. These include protein and microRNA-binding sites. Transterm (http://mRNA.otago.ac.nz/Transterm.html) is a database of regions and elements that affect translation with two major unique components. The first is integrated results of analysis of general features that affe...

متن کامل

Computation Optical Flow Using Pipeline Architecture

Accurate estimation of motion from time-varying imagery has been a popular problem in vision studies, This information can be used in segmentation, 3D motion and shape recovery, target tracking, and other problems in scene analysis and interpretation. We have presented a dynamic image model for estimating image motion from image sequences, and have shown how the solution can be obtained from a ...

متن کامل

Phylogeny of gazelles in some islands of Iran based on mtDNA sequences: Species identification and implications for conservation

Different species of gazelles are among the most endangered mammals on the Asian steppes and occur in the central, southern and northwestern regions of Iran. The previous conservation efforts in this region have been incomplete due to confusion about the phylogenetic relationship among various populations. So that, different conservation programs such as ex-situ breeding and transfer of captive...

متن کامل

DIYA: a bacterial annotation pipeline for any genomics lab

UNLABELLED DIYA (Do-It-Yourself Annotator) is a modular and configurable open source pipeline software, written in Perl, used for the rapid annotation of bacterial genome sequences. The software is currently used to take DNA contigs as input, either in the form of complete genomes or the result of shotgun sequencing, and produce an annotated sequence in Genbank file format as output. AVAILABI...

متن کامل

APPLIED OF IMPRESSED CURRENT CATHODIC PROTECTION DESIGN FOR FUEL PIPELINE NETWORK AT NAVAL BASE

Indonesian Navy (TNI AL) is the main component for Maritime Security and Defence. Because of that, TNI AL needs Indonesian Warship (KRI) to covered Maritime area. The main requirement from KRI is fulfilled by demand. To pock of fuel demand from KRI at Naval Base, it needs a new pipeline of fuel distribution network system. The pipeline network system used for maximum lifetime must be protected ...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:
  • Bioinformatics

دوره 30 20  شماره 

صفحات  -

تاریخ انتشار 2014